This assignment is for ETC5521 Assignment 1 by Team Echidna comprising of Ruimin Lin, Rahul Bharadwaj, Ketan Kabu, and Yezi He.

1 Introduction and Motivation

Board Game has been a type of leisure that people have enjoyed from a very long time even before computers and video-games existed and has gone through enormous evolution ever since its inception. Board Games enables a way for people to socialize, reducing stress under such a fast-moving society, and paves way for an extensive brain exercise. Being a popular choice of leisure, what makes board games great? What is the reason for Board Games to have survived in a world of Virtual Reality games? In other words, what are the common characteristics of top ranked board games? What are the best board games in terms of average rating?

The original board games data used in this report is obtained from the Board Game Geek database, and is cleaned and shared by Thomas Mock.

The tidy dataset consists of 22 columns and 10532 rows, in which there are 22 variables and 10532 observations. It consists of data such as max/min playtime, max/min players, min age of players that can play, game designer, game publisher, mechanics of the game and a lot more. One thing to notice is that even though the data set is tidy, we still find observations in variables like category, family, mechanic to be messy and repetitive, which may limit our ability to explore these variables.

2 Data Description

The aim of this exploratory analysis is to find out what factor affects the average rating of board games. This would give insights as to what board games are most popular and the characteristics these board games share. Therefore, we have articulated the following questions to help us with further exploration of the board games data.

Primary Question:

What are the common characteristics of top ranked board games?

Secondary Questions:

  1. What are the top 10 ranked board games?
  2. How do variables like min/max playtime, min/max players, or min_age affect the average rating?
  3. How have the current best categories of board games evolved over time, in the last few decades, with respect to average rating?
  4. Which game designer was most successful in producing popular games? Which publisher published the most popular games?
  5. Which kind of mechanic is most popular? How do different kinds of mechanic performed?

The variables included in the data are as follows:

  • game_id: ID of a particular game, the game_id should be a character vector(categorical) instead of a double vector mentioned in the table above.

  • description: Game description, a character vector.

  • image: URL image of the game, a character vector.

  • max_players/min_player: maximum/minimum number of recommended players, double vectors.

  • max_playtime/min_playtime: maximum/minimum recommended playtime, double vectors.

  • min_age: recommended minimum player age, double vectors.

  • name: name of the game, a character vector.

  • playing_time: average playtime of a game, a double vector.

  • thumbnail: URL thumbnail of the game, a character vector.

  • year_published: year the game was published, a double vector.

  • artist: artist for game art, a character vector.

  • category: categories of the game, a character vector.

  • compilation: name of compilation, a character vector.

  • designer: game designer, a character vector.

  • expansion: name of expansion pack (if any), a character vector.

  • family: family of game - equivalent to a publisher, a character vector.

  • mechanic: how game is played, a character vector.

  • publisher: company/person who published the game, a character vector.

  • average_rating: average rating from 1 to 10 on the website(Board Games Geek), a double vector.

  • users_rated: number of users rated the game, a double vector.

To ensure the reliability of the board game ratings, the data is limited to games with at least 50 ratings and for games between 1950 and 2016. The site’s database has more than 90,000 games with crowd-sourced ratings.

The original board games data set consists of 90400 observations, and 80 variables. Therefore, data cleaning and wrangling is necessary to enable better analysis procedure. Thomas has replaced long and complicated variable names like details.description in original data to description using janitor::clean_names and set_names function, which avoids messy code writing. In addition, he has eliminated around 50 variables using the select function and that leaves 27 variables at this stage.

The data set is then filtered to board games published from 1950 to 2016, with at least 50 users rated. ‘NA’ values in variable year_published is also omitted. Thomas then excludes variables that may not be useful for the analysis, such as attributes_total, game_type etc., which ultimately, leaves us with a tidy data set (22 variables and 10532 variables) that is relatively concise and convenient for further exploration.

3 Analysis and Findings

3.1 Initial Data Analysis

  • Initial Data Analysis is a process which helps one get a feel of the data in question. This helps us have an overview of the data and gives insights about potential Exlporatory Data Analyis (EDA).
  • Initial data analysis is the process of data inspection steps to be carried out after the research plan and data collection have been finished but before formal statistical analyses. The purpose is to minimize the risk of incorrect or misleading results.
  • IDA can be divided into 3 main steps:
    • Data cleaning is the identification of inconsistencies in the data and the resolution of any such issues.
    • Data screening is the description of the data properties.
    • Documentation and reporting preserve the information for the later statistical analysis and models.
Visualization of Data Types

Figure 3.1: Visualization of Data Types

  • The plot above 3.1 clearly visualizes the distribution of data types in our dataset with column in x-axis and number of observations on the y-axis. This gives a concise overview of the data and what columns are useful for analysis. This plot hints that we can use all the numeric columns along with designer and publisher columns for our analysis.
Visualization of Missing Values

Figure 3.2: Visualization of Missing Values

  • The above plot 3.2 shows the percentage of missing values and where exactly they are missing with x-axis showing columns and the y-axis showing the corresponding observations. We can also observe that each column has a percentage of missing values mentioned which come in handy while deciding what columns not to pick for analysis.

  • It is evident that the following columns have missing values and are not of much use for the analysis:

    • compilation - 96.11% missing
    • expansion - 73.87% missing
    • family - 26.66% missing
    • mechanic - 9.02% missing
  • This is a limitation of the dataset and we frame our questions keeping this in mind.

3.2 Questions of Interest

3.2.1 What are the top 10 ranked board games?

Top 10 ranked board games

Figure 3.3: Top 10 ranked board games

Game Average rating Max playtime Min playtime Max players Min players
Small World Designer Edition 9.00 80 40 6 2
Kingdom Death: Monster 8.93 180 60 6 1
Terra Mystica: Big Box 8.85 150 60 5 2
Last Chance for Victory 8.85 60 60 2 2
The Greatest Day: Sword, Juno, and Gold Beaches 8.83 6000 60 8 2
Last Blitzkrieg 8.80 960 180 4 2
Enemy Action: Ardennes 8.76 600 0 2 1
Through the Ages: A New Story of Civilization 8.74 240 180 4 2
1817 8.71 540 360 7 3
Pandemic Legacy: Season 1 8.67 60 60 4 2

3.2.2 How do variables like min/max playtime, min/max players, or min_age affect the average rating in these top-ranked board games?

Visualization of Data Types in Top 50 Games

Figure 3.4: Visualization of Data Types in Top 50 Games

  • The above plot 3.4 shows a distribution of Data Types in our Top 50 Games dataset with x-axis showing column names and y-axis its corresponding observations.

  • It is evident that our selection of columns is appropriate and there are no missing values in our data. Hence, we need not check for missing values through vis_miss() function. We can use all these columns for an effective analysis of our questions of interest.

Figure 3.5: Relationship between Maximum Playtime and Average Rating

  • To have a better idea on the common characteristics of top-ranked board games and ensuring the reliability of the results, we have widened the range to top 50.

  • In plot 3.5 we can see that there are a few obvious distinct values present, which are:

    • The Greatest Day:Sword, Juno, and Gold Beaches with 6000 minutes max. playtime and an average rating of 8.8308
    • Axis Empires: Totaler Krieg! with 3600 minutes max. playtime and average rating of 8.4194
    • Beyond the Rhine with 3000 minutes max. playtime and average rating of 8.5979
  • It is difficult to examine the trend or common characteristics with these outliers presents, therefore, we have limited the maximum playtime to less than xx minutes using the IQR outliers formula. (Q1 - 1.5IQR and Q3 + 1.5 IQR)

## # A tibble: 1 x 6
##   minimum    q1 median  mean    q3 maximum
##     <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>
## 1       0  82.5   142.  461.   345    6000
## # A tibble: 1 x 2
##   lower_range upper_range
##         <dbl>       <dbl>
## 1       -311.        739.
Adjusted Maximum Playtime

Figure 3.6: Adjusted Maximum Playtime

Now we can have a clearer picture of where majority of top-50 ranked board games lie in the graph of average rating against maximum playtime. Which, majority of board games lie within the range of 200 minutes of maximum playtime, the highest ranting board game also lies within the range, around 100 minutes of maximum playtime. Another thing to notice is that, for board games that have maximum playtime longer than 600 minutes, the rating is comparatively lower.

Nearly half of high rating board games are crowded in the range of 0-200 minutes, suggesting that people tend to play board games that does not occupy too much leisure time.

Minimum Playtime Plot

Figure 3.7: Minimum Playtime Plot

  • We have implemented the same method to omit the outliers as done previously, the graph demonstrates that in top-50 ranked board games, most of them have a minimum playtime less than 100 minutes.
Relationship between Minimum Players and Average Rating

Figure 3.8: Relationship between Minimum Players and Average Rating

  • In the scatterplot for average rating against minimum players, we observed that most top 50 board games have at least 2 players.
Relationship between Maximum Players and Average Rating

Figure 3.9: Relationship between Maximum Players and Average Rating

  • In the scatterplot for average rating against maximum players, we observed that most top 50 board games have a maximum of 4 or 5 players.

  • The figure 3.8 and 3.9 indicates that majority of high rating board games have set the players to between 2 and 4/5 players. The limitation of players suggest that people tend to play board games that fulfills their sense of participation, for example, a board game of 8 players may not be as attractive as a board game of 2 players, because a 2-player game has little downtime than a 8-player game, and satisfies each players’ sense of participation in the board game.

  • On the other hand, it is easier to gather a group of 2-4 people interesting in play board games at leisure time than gathering a group of 8 or more people.

Minimum Age Plot

Figure 3.10: Minimum Age Plot

  • In the scatterplot for average rating against minimum age of players, we observed that the minimum age set by majority of board games are between 10 - 15.
Summarizing all observations as Violinplots

Figure 3.11: Summarizing all observations as Violinplots

  • All the insights for the top 50 popular games are summarized in the boxplots above as follows:
    • A maximum of 4 players and minimum of 2 players is most popular in the top 50 games.
    • The maximum and minimum playtime seem to be almost close and range between 60-150 minutes for top 50 games.
Relationship between Average Rating and other Attributes

Figure 3.12: Relationship between Average Rating and other Attributes

  • The above plot 3.12 shows a trend for different attributes against average rating on x-axis. We can get a better idea using this pattern.

  • We can observe the following trend for the top 50 rated games as average rating increases -

    • The Minimum Players tends to be around 2 players. The Maximum Players tends to be around 4 and increases up to 6.
    • The Minimum Playtime tends to vary between 60-500 minutes. The Maximum Playtime tends to vary between 150-1000 minutes.
Minimum Age Plots

Figure 3.13: Minimum Age Plots

  • We can observe the following for the attribute Minimum Age -

    • Players of age between 10-15 years mostly play the top 50 games.
    • We can observe from the trend that games are more popular among age group of 7-13 year olds

3.2.3 How have the current best categories evolved over time, with respect to average rating?

Important note - Majority of the games are classified into multiple categories at once. For example, the game Top Trumps has about 14 categories it has been classified under. Therefore, the calculation of number of games under each category will be an overlapping process, as one game would be classsified under various categories. If one would calculate the percentage of games under all category, it will be higher than 100%.

Even so, it gives us a good understanding about the popularity of a category, and for new game developers to look into which category they should target for their next upcoming games.

Top 10 categories of board games in 2016. The length of the bars parallel to the x-axis represent average rating of all categories, with each of the the top 10 categories mapped on the y-axis. Numbers on the right of bars represent games of those categories launced in 2016. Clearly observable that war type games are most liked, with 43 games in Wargame categories in 2016 alone.

Figure 3.14: Top 10 categories of board games in 2016. The length of the bars parallel to the x-axis represent average rating of all categories, with each of the the top 10 categories mapped on the y-axis. Numbers on the right of bars represent games of those categories launced in 2016. Clearly observable that war type games are most liked, with 43 games in Wargame categories in 2016 alone.

  • Numbers on the right hand side of bars represent the number of games of that category that were published in 2016. This gives us a better understanding of the ratings as some will be spread out over a number of games, even if they are relatively lower in ranking, like War Games and Miniatures categories. They have 43 and 29 games published respectively in 2016, and yet have maintained a top 10 ranking of 6 and 8 respectively. Vitenam War category has just one game published but was rated quite highly (7.75). American Indian Wars category had 2 games published in 2016, and was the highest rated category with an average rating of 7.85

  • It is clear from the above graph that American Indian Wars category has the highest rating in the most recent yearly data available (2016).

  • An interesting point to note about human behavior and how we like to pass our free time, when we see that the top 7 out of those top 10 categories of board games are war related.

  • For anyone reading this paper in a relevant field, it would be an interesting psychological experiment to understand why we, as a species, are so drawn to war-like scenarios even during our innocent board game time.

  • Additionally, a new game developer could draw an insight from this chart that war based games are the popular, when brainstorming ideas for new games to launch.

Let us now consider, how these categories in figure 3.14 above have evolved over time. We can take a look at a timeline of all these board games and see if we can observe any trends in the likability or acceptance of board game categories.

Current top 10 board games rating over the last 50+ years. Decade and Rating mapped on x and y axis resepctively. Numbers on graph represent number of games published each decade in that category. Each facet represents a game category in 2016 top 10. A steep rise in likability since last decade is visible for all games, with a dip for nearly all categories in the 1980s. Important to consider that number of games in 2010s are only till 2016.

Figure 3.15: Current top 10 board games rating over the last 50+ years. Decade and Rating mapped on x and y axis resepctively. Numbers on graph represent number of games published each decade in that category. Each facet represents a game category in 2016 top 10. A steep rise in likability since last decade is visible for all games, with a dip for nearly all categories in the 1980s. Important to consider that number of games in 2010s are only till 2016.

  • An initial look at the figure 3.15 above indicates that nearly all these 2016 top 10 categories have undergone significant rise in ratings in the last decade.

  • Nearly all war themed games (except Modern Warfare) had undergone a drop in ratings in the 1980s, and then sharply rose and kept rising till 2016. World War did not see a drop in ratings in the 80s, but had it’s growth plateaued as well.

  • Vietnam war (the actual war) lasted 2 decades from 1950s to the 70s. The first game introduced in this category had a rating just above 7 around the time the war ended. Ratings droppped as with most war games in the 1980s, and then rose again to be the second highest rated category as of 2016.

  • When we look at the number of games manufactured for each category and each decade, it is of note that the top rated category in 2016 - American Idian wars, had only 1 game in the 1990s, when it dropped to it’s lowest point. This was followed by 2 and then 6 games in the following decades, when it rose to the top of the rankings.

  • There were 545 games of the Wargame category published in 2000s, followed by a drop in numbers to 424 in 2010s, but ratings rose to nearly 7.5.

4 Conclusion

From the above analysis, we can see that the top ranked board games have the following common characteristics. A maximum of 4 players and minimum of 2 players is most popular in the top 50 games. The maximum and minimum playtime seem to be almost close and range between 60-150 minutes for top 50 games. The minimum age set by majority of board games are between 10 - 15. American Indian Wars category has the highest rating in the most recent yearly data available. Philippe Keyaerts has the highest rated game and Multi-Man Publishing is one of the best publishers. Hex-and-Counter is the most popular mechanic.

5 References

Websites

R packages